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Abstract 

Traditionally, most of football statistical and media 
coverage has been focused almost exclusively on goals 
and (ocassionally) shots. However, most of the dura¬ 
tion of a football game is spent away from the boxes, 
passing the ball around. The way teams pass the 
ball around is the most characteristic measurement of 
what a team’s “unique style” is. In the present work 
we analyse passing sequences at the player level, us¬ 
ing the different passing frequencies as a “digital fin¬ 
gerprint” of a player’s style. The resulting numbers 
provide an adequate feature set which can be used 
in order to construct a measure of similarity between 
players. Armed with such a similarity tool, one can 
try to answer the question: ‘Who might possibly re¬ 
place Xavi at FC Barcelona?’ 

1 Introduction 

Association football (simply referred to as football in 
the forthcoming) is arguably the most popular sport 
in the world. Traditionally, plenty of attention has 
been devoted to goals and their distribution as the 
main focus of football statistics. However, shots re¬ 
main a rare occurrence in football games, to a much 
larger extent than in other team sports. 

Long possessions and paucity of scoring oppor¬ 
tunities are defining features of football games. Passes, 
on the other hand, are two orders of magnitude more 
frequent than goals, and therfore constitute a much 
more appropriate event to look at when trying to de¬ 
scribe the elusive quality of ‘playing style’. Some 
studies on passing have been performed, either at the 
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level of passing sequences distributions (cf | 6][9p3| ), 
by studying passing networks or from a dy¬ 

namic perspective studying game flow Q, or pass¬ 
ing J?ow motifs at the team level Q, where passing 
flow motifs (developed following |10|) were satis¬ 
factorily proven by Gyarmati, Kwak and Rodriguez 
to set appart passing style from football teams from 
randomized networks. 

In the present work we ellaborate on Q by ex¬ 
tending the flow motif analysis to a player level. We 
start by breaking down all possible 3-passes motifs 
into all the different variations resulting from labelling 
a distinguished node in the motif, resulting on a to¬ 
tal of 15 different 3-passes motifs at the player level 
(stemming from the 5 motifs for teams). For each 
player in our dataset, and each game they partitipate 
in, we compute the number of instances each pattern 
occurs. The resulting 15-dimensional distribution is 
used as a fingerprint for the player style, which char¬ 
acterizes what type of involvement the player has 
with his teammates. 

The resulting feature vectors are then used in or¬ 
der to provide a notion of similarity between differ¬ 
ent football players, providing us with a quantifiable 
measure on how close the playing styles between any 
two arbitrary players are. This is done in two differ¬ 
ent ways, first by performing a Clustering Analysis 
(with automatic cluster detection) on the feature vec¬ 
tors, which allow us to identify 37 separate groups of 
similar players, and secondly by defining a distance 
function (based on the mean features z-scores) which 
consequently is used to construct the distance simi¬ 
larity score. 

As an illustrative example, we perform a detailed 
analysis of all the defined quantities for Xavi Hernandez, 
captain of FC Barcelona who just left the team after 
many years in which he has been considered the flag¬ 
ship of the famous tiki-taka style both for his club 







and for the Spanish national team. Using our data- 
based style fingerprint we try to address the pressing 
question: which player could possibly replace the 
best passer in the world? 

2 Methodology 

The basis of our analysis is the study of passing sub¬ 
sequences. The passing style of a team is partially 
encoded, from an static point of view, in the pass¬ 
ing network (cf. Q). A more dynamical approach is 
taken in Q, where passing subsequences are classi¬ 
fied (at the team level) through “flow motifs” of the 
passing network. 

Inspired by the work on flow motifs for teams, 
we carry out a similar analysis at the player level. 
We focus on studying flow motifs corresponding to 
sequences of three consecutive passes. Passing mo¬ 
tifs are not concerned with the names of the players 
involved on a sequence of passes, but rather on the 
structure of the sequence itself. From a team’s point 
of view, there are five possible variations: ABAB, 
ABAC, ABC A, ABCB, and ABCD (where each let¬ 
ter represents a different player within the sequence). 



Figure 1: The five team flow motifs 


The situation is different when looking at flow 
motifs from an specific player’s point of view, as that 
player needs to be singled out within each passing 
sequence. Allowing for variation of a single player’s 
relative position within a passing sequence, the to¬ 
tal numer of motifs increases to fifteen. These pat¬ 
terns can all be obtained by swapping the position 
of player A with each of the other players (and re¬ 
labelling if necessary) in each of the five motifs for 
teams. Adopting the convention that our singled-out 
player is always denoted by letter ‘A’, the resulting 
motifs can be labelled as follows (the basic team mo¬ 
tif shown in bold letters): 


ABAB, BABA 

ABAC, BABC, BCBA 

ABCA, BACB, BCAB 

ABCB, BACA, BCAC 
ABCD, BACD, BCAD, BCDA 

When tracking passing sequences, we will con¬ 
sider only possessions consisting of uninterrupted con¬ 
secutive events during which the ball is kept under 
control by the same team. As such, we will consider 
than a possession ends any time the game gets in¬ 
terrupted or an action does not have a clear passing 
target. In particular, we will consider that posessions 
get interrupted by fouls, by the ball getting out of 
play, whenever there is a “divided ball” (eg an aerial 
duel), by clearances, interceptions, passes towards 
an open space without a clear target, or by shots, 
regardless on who gets to keep the ball afterwards. 
The motivation for this choice is that we are trying 
to keep track of game style through controlled, con¬ 
scious actions. It is worth noting that here we are 
using a different methodology from the one in Q 
(where passes are considered to belong to the same 
sequence if they are separated by less than five sec¬ 
onds). 

Our analysed data consists of all English Pre¬ 
mier League games over the last five seasons (com¬ 
prising a total of 1900 games and 1402195 passes), 
all Spanish Liga games over the last three season 
(1140 games and 792829 passes), and the last season 
of Champions League data (124 games and 105993 
passes). To reduce the impact of outliers, we have 
limited our study to players that have participated 
in at least 19 games (half a season). In particular, 
this means that only players playing the English and 
Spanish leagues are tracked in our analysis. Unfor¬ 
tunately, at the time of writing we do not have at 
our dispossal enough data about other European big 
leagues to make the study more comprehensive. 

The resulting dataset contains a total of 1296 play¬ 
ers. Eor each of the analyzed players, we compute 
the average number of occurrences of each of the 
fifteen passing motifs listed above, and use the re¬ 
sults as the features vector in order to describe the 
player’s style. Eor some of the analysis which re¬ 
quire making different types of subsequences com- 
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parable, we replace the feature vector by the corre¬ 
sponding z-scores (where for each feature mean and 
standard deviation are computed over all the players 
included in the study). 

Our analysis uses raw data for game events pro¬ 
vided by Opta. Data munging, model fitting, analy¬ 
sis, and chart plotting were performed using IPython 
GD and the python scientific stack mini- 

3 Analysis and results 

Summary statistics and motifs distributions 

A summary analysis of the passing motifs is shown 
in Table [T] Perhaps unsurprisingly, the maximum 
value for almost every single motif is reached by a 
player from FC Barcelona, the only exception being 
Yaya Tourej^Figure]^ shows the frequency distribu¬ 
tions for player values at every kind of motif, and the 
relative position of Xavi within those distributions. 


Motif 

Mean 

Std 

Max 

Player 

ABAB 

0.33 

0.31 

3.56 

Dani Alves 

ABAC 

1.52 

1.30 

8.71 

Thiago Alcantara 

ABCA 

0.90 

0.73 

5.99 

Xavi 

ABCB 

1.53 

1.08 

7.69 

Sergio Busquets 

ABCD 

6.03 

3.62 

25.53 

Jordi Alba 

BABA 

0.33 

0.29 

2.72 

Lionel Messi 

BABC 

1.53 

1.07 

7.33 

Xavi 

BACA 

1.51 

1.28 

8.94 

Thiago Alcantara 

BACB 

0.91 

0.59 

3.79 

Xavi 

BACD 

6.01 

4.17 

27.21 

Xavi 

BCAB 

0.91 

0.58 

3.93 

Yaya Toure 

BCAC 

1.52 

1.08 

6.83 

Jordi Alba 

BCAD 

6.00 

4.11 

28.89 

Xavi 

BCBA 

1.53 

1.03 

8.29 

Sergio Busquets 

BCDA 

6.01 

3.47 

23.64 

Dani Alves 


Table 1: Motif average values and players with high¬ 
est values 

*Toure did play for FC Barcelona, however, our dataset only 
contains games in which he played for Manchester City. On 
the opposite side, we only have data for Thiago Alcantara as 
a Barcelona player as our dataset does not include the German 
Bundesliga. 


We can see how Xavi dominates the passing, be¬ 
ing the player featuring the highest numbers in five 
out of the fifteen motifs. Table|2]shows all the values 
and z-scores for Xavi. It is indeed remarkable that 
he manages to be consistently over four standard de¬ 
viations away from the average passing patters, and 
particularly striking his astonishing z-score of 6.95 
in the ABCA motif, which corresponds to being the 
starting and finishing node of a triangulation. To put 
this number in context, if we were talking about ran¬ 
dom daily events, one would expect to observe such 
a strong deviation from the average approximately 
once every billion years 0 


Motif 

Value 

z-score 

ABAB 

1.57 

3.97 

ABAC 

8.67 

5.49 

ABCA 

5.99 

6.95 

ABCB 

7.12 

5.19 

ABCD 

21.44 

4.26 

BABA 

1.71 

4.71 

BABC 

7.33 

5.41 

BACA 

8.58 

5.51 

BACB 

3.79 

4.88 

BACD 

27.21 

5.08 

BCAB 

3.27 

4.06 

BCAC 

6.78 

4.86 

BCAD 

28.89 

5.57 

BCBA 

7.08 

5.40 

BCDA 

23.03 

4.90 


Table 2: Motif values and z-scores for Xavi 

Clustering and PCA 

Using the passing motifs means as feature vectors, 
we performed some clustering analysis on our player 
set. The Affinity Propagation method with a damp¬ 
ing coefficient of 0.9 yields a total of 37 clusters with 
varying number of players, listed in Tablewhere a 
representative player for every cluster is also listed. 
The explicit composition of each of the clusters of 
size smaller than 10 is shown in Table Once again 

^From a very rigorous point of view, actual passing patterns 
are neither random nor normally distributed. Statistical techni¬ 
calities notwithstanding, Xavi’s z-scores are truly off the charts! 
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Figure 2: Pricipal Component Analysis, with labels 
for small AP clusters 



PC 1 

PC 2 

ABAB 

0.030 

0.065 

ABAC 

0.153 

-0.019 

ABCA 

0.084 

-0.031 

ABCB 

0.127 

-0.091 

ABCD 

0.437 

0.150 

BABA 

0.027 

0.051 

BABC 

0.114 

0.257 

BACA 

0.150 

-0.040 

BACB 

0.070 

0.043 

BACD 

0.514 

-0.451 

BCAB 

0.064 

0.086 

BCAC 

0.107 

0.323 

BCAD 

0.511 

-0.310 

BCBA 

0.123 

-0.062 

BCDA 

0.406 

0.690 

Explained variance 

0.917 

0.046 


Table 3: Principal Components and their explained 
variance 


we can observe how the passing style of Xavi is dif¬ 
ferent enough from everyone else’s to the extent that 
he gets assignated to a cluster of his own! 

Figure shows the relative players feature vec¬ 
tors, plotted using the first two components of a Prin¬ 
cipal Component Analysis (after using a whitening 
transformation to eliminate correlation). The PC’s 
coefficients, together with their explained variance 
ratio, are listed in Table After looking at Figure 
one can think of the first principal component (PC 

1) as a measurement of overall involvement on the 
game, whereas the second principal componen (PC 

2) separates players depending on their positional in¬ 
volvement, with high positive values highlight play¬ 
ers playing on the wings and with a strong attack¬ 
ing involvement, and smaller values relate to a more 
purely defensive involvement. Special mention on 
this respect goes to Dani Alves and Jordi Alba, who 
in spite of playing as fullbacks display a passing dis¬ 
tribution more similar to the ones of forwards than to 
other fullbacks. The plot also shows how Xavi has 
the highest value for overall involvement and a bal¬ 
anced involvement between offensive and defensive 
passing patterns. 


Player distance and similarity 

Our feature vector can be used in order to define 
a measure of similarity between players. Given a 
player i, let v* denote the vector of z-scores in pass¬ 
ing motifs for player i. Our definition of distance 
between two players i and j is simply the Euclidean 
distance between the corresponding (z-scores) fea¬ 
ture vectors: 

■= II Vi - Vj||2 = - Vj,mf 

y mG motifs 

This distance can be used as a measure of simi¬ 
larity between players, allowing us to establish how 
closely related are the passing patterns of any two 
given players. In more concrete terms, the coeffi¬ 
cient of similarity is defined by 


This similarity score is always between 0 and 1, with 
1 meaning that two players display an identical pass¬ 
ing pattern. 

The reason for choosing z-scores rather than raw 
values is to allow for a better comparison between 
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different passing motifs, as using raw values would 
yield a distance dominated by the four motifs derived 
from ABCD, which show up in a frequency one or¬ 
der of magnitude higher than any other pattern. Ta¬ 
ble]^ shows a summary of the average and minimum 
distances for all the players in our dataset, showing 
that for an average player we can reasonably expect 
to find another one at a distance of 0.826 ± 0.5. 



Mean 

Closesl 

Avg value 

4.471 

0.826 

Sfd deviation 

1.800 

0.500 

Min value 

3.188 

0.178 

Max value 

19.960 

5.134 


Table 4: Average and closest player distances. 

An immediate application of this is to find ouf, 
for a given player, who is his closest peer, which 
will be fhe player displaying the most similar pass¬ 
ing pattern. Table shows the minimum distances 
to the ten bottom players (the ones with the small¬ 
est minimum distance, hence easier to replace) and 
the top 10 players (the ones with the hightest mini¬ 
mum distance, thus harder to replace). Once again, 
we can see how the top 10 players are dominated by 
FC Barcelona players. 


Player 

Closesl 

Player 

Closesl 

R Boakye 

0.18 

A Rangel 

3.08 

Tuncay 

0.18 

Neymar 

3.26 

J Arizmendi 

0.23 

Y Toure 

3.92 

J Roberls 

0.23 

T Alcanlara 

3.92 

S Flelcher 

0.23 

A Iniesfa 

4.27 

F Bor ini 

0.23 

J Alba 

4.48 

G Toquero 

0.24 

D Alves 

4.48 

Baba 

0.24 

Xavi 

4.49 

J Wallers 

0.25 

L Messi 

5.09 

C Austin 

0.25 

S Busquefs 

5.13 


Table 5: Players minimum distance (bottom 10 and 
top 10) 


Note that in some cases, the closest peer for a 
player happens to play for the same team, as it is 
the case for Jordi Alba, whose closest peer is Dani 


Alves. We decided against filtering closest player to 
search in team as it would make the analysis overly 
complicated due to constant player movement be¬ 
tween teams. 

Previous table shows that Xavi is amongst the 
hardest players to find a close replacemenf for. Table 
[^show fhe 20 players closesl fo Xavi. Among fhose, 
no one has a similarify score higher fhaf 18.2%, and 
only fen players have a score higher fhan 10%. 


Player 

Disfance 

Similarify (%) 

Yaya Toure 

4.495 

18.199 

Thiago Alcanlara 

5.835 

14.631 

Sergio Busquefs 

6.494 

13.345 

Andres Iniesfa 

7.038 

12.441 

Cesc Fabregas 

7.377 

11.938 

Jordi Alba 

7.396 

11.910 

Toni Kroos 

7.853 

11.296 

Mikel Arlefa 

8.257 

10.802 

Michael Carrick 

8.505 

10.521 

Sanfiago Cazorla 

8.515 

10.509 

Daley Blind 

9.154 

9.849 

Paul Scholes 

9.240 

9.765 

Gerard Pique 

9.524 

9.502 

David Silva 

9.640 

9.398 

Marcos Rojo 

9.671 

9.371 

Angel Rangel 

9.675 

9.368 

Samir Nasri 

9.683 

9.360 

Leon Britton 

9.797 

9.261 

Aaron Ramsey 

9.821 

9.241 

Marfm Montoya 

9.846 

9.220 


Table 6: Disfances and similarify scores of fhe 20 
players closesl lo Xavi. 


4 Conclusions and future work 

We have shown how fhe flow molif analysis can be 
exfended from learns lo players. Allhough Ihere is an 
added level of complexify raising from fhe increas¬ 
ing of the different motives, the resulting data does 
a good job classifying and discriminating players. 
Clustering analysis provides a reasonable grouping 
of players with similar characteristics, and the simi¬ 
larity score provides a quantifiable measure on how 
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similar any two players are. We believe these tools 
ean be useful for seouting and for early talent deten¬ 
tion if implemented properly. 

For future work, we plan to expand our dataset to 
eover all the major European leagues over a longer 
time span. A larger dataset would allow us to mea¬ 
sure ehanges in style over a player’s career, and per¬ 
haps to isolate a team factor that would allow to es¬ 
timate what would be a player’s style if he were to 
switch teams. Another interesting thing to explore 
would be the density of each of the passing motifs 
according to pitch coordinates. 

Coming back to our motivating question, who 
can replace Xavi at Barcelona? Amongst all the ten 
players that showing a similarity score bigger than 
10, three are already at Barcelona (Busquets, Iniesta 
and Jordi Alba), and another three used to play there 
but left (Toure, Alcantara and Fabregas). Arteta, Car- 
rick and Cazorla are all in their thirties, ruling them 
out as a long-term replacement, and Toni Kroos plays 
for Barcelona arch-rivals Real Madrid, making a move 
quite complicated (although not impossible, as cur¬ 
rent Barcelona manager Luis Enrique knows very 
well), the only choices for Barcelona seem to be ei¬ 
ther to recover Alcantara or Fabregas, or to recon¬ 
vert Iniesta to play further away from the oposition 
box. A bolder move would be the Dutch rising star, 
Daley Blind (who used to play as a fullback, but has 
been tested as a midfielder over the last season in Van 
Gaal’s Manchester United), hoping that the young 
could rise to the challenge. 

Xavi’s passing patter stands out in every single 
metric we have used for our analysis. Isolated in 
his own cluster, and very far away from any other 
player, all data seems to point out at the fact that Xavi 
Hernandez is, literally, one of a kind. 


Representative Player 

Cluster size 

Xavi 

1 

Dani Alves 

2 

Thiago Alcantara 

4 

David Silva 

4 

Gerard Pique 

5 

Bacary Sagna 

6 

Isco 

8 

Chico 

10 

Mahamadou Diarra 

12 

Jonny Evans 

12 

Jordan Henderson 

15 

Andreu Fontas 

17 

Christian Eriksen 

17 

Hugo Mallo 

18 

Victor Wanyama 

19 

Cesar Azpilicueta 

19 

Alberto Moreno 

20 

Gareth Bale 

20 

Fran Rico 

30 

David de Gea 

36 

Antolin Alcaraz 

36 

Phil Jagielka 

39 

Sebastian Larsson 

40 

Liam Ridgewell 

41 

Emmerson Boyce 

44 

Nyom 

46 

John Ruddy 

48 

Adam Johnson 

52 

Richmond Boakye 

57 

Chechu Dorado 

61 

Manuel hurra 

62 

Loukas Vyntra 

62 

Kevin Gameiro 

72 

Borja 

73 

Ruben Garcia 

85 

Gabriel Agbonlahor 

90 

Steven Fletcher 

113 


Table 7: Affinity propagation cluster sizes and repre¬ 
sentative players 
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Size 

Players 

1 

Xavi 

2 

Dani Alves, Jordi Alba 

4 

David Silva, Lionel Messi, 

Samir Nasri, Santiago Cazorla 

4 

Andres Iniesta, Cesc Fabregas, 

Thiago Alcantara, Yaya Toure 

5 

Daley Blind, Gerard Pique, Javier Mascherano, 

Sergio Busquets, Toni Kroos 

6 

Adriano, Angel Rangel, Bacary Sagna, 

Gael Clichy, Marcelo, Martin Montoya 

8 

Emre Can, Isco, James Rodriguez, Juan Mata, 

Maicon, Mesut Ozil, Michael Ballack, Ryan Mason 

10 

Ashley Williams, Carles Puyol, Chico, Marc Bartra, Marcos Rojo, 

Michael Carrick, Mikel Arteta, Nemanja Matic, Paul Scholes, Sergio Ramos 

12 

Dejan Lovren, Garry Monk, John Terry, Jonny Evans, 

Ki Sung-yueng, Matija Nastasic, Michael Essien, Morgan Schneiderlin, 

Nabil Bentaleb, Per Mertesacker, Roberto Trashorras, Vincent Kompany 

12 

Aaron Ramsey, Alexandre Song, Eernandinho, Gareth Barry, 

Jerome Boateng, Jonathan de Guzman, Eeon Britton, Euka Modric, 

Mahamadou Diarra, Mamadou Sakho, Steven Gerrard, Xabi Alonso 

15 

Ander Herrera, Eric Dier, Erank Eampard, Ivan Rakitic, 

Jamie O’Hara, Jordan Henderson, Michael Krohn-Dehli, Rafael van der Vaart, 
Ratinha, Sascha Riether, Scott Parker, Seydou Keita, 

Steven Davis, Vassiriki Abou Diaby, Wayne Rooney 


Table 8: Affinity propagation clustering: Composition of small clusters 
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Figure 3: Passign motifs distributions 
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